Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Noref more permissive gnd uri handling #114

Conversation

karkraeg
Copy link
Member

@karkraeg karkraeg commented Jan 13, 2025

❤️ Thank you for your contribution!

Description

This PR aims at simplifying the process of adding a GND ID to a person in the creatibutors modal. Until now users could only paste the actual GND ID, not one of the GND URIs. When researching a GND URI the Website of the german national library looks like this:

image

So most people will just copy and paste this URI. When then saving or publishing the draft in RDM there will be a "Creators: No valid scheme recognized for identifier." error, which in our opinion is a major inconvenience.

This PR allows for pasting both http://d-nb.info/gnd/<id> and https://d-nb.info/gnd/<id> in addition to GND:<id>, gnd:<id> and <id>. The normalization step is the same as before.

Checklist

Ticks in all boxes and 🟢 on all GitHub actions status checks are required to merge:

idutils/utils.py Outdated
@@ -81,8 +81,10 @@
https://support.orcid.org/hc/en-us/articles/360006897674-Structure-of-the-ORCID-Identifier
"""

gnd_resolver_url = "d-nb.info/gnd/"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think gnd_resolver_url is not needed any more.

It's used at the beginning of the validate function, but I don't that's needed after this code change

def is_gnd(val):

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its also used at

if val.startswith(gnd_resolver_url):
but I guess the url does not need to be configurble.

idutils/utils.py Outdated
gnd_regexp = re.compile(
r"(gnd:|GND:)?("
rf"(gnd:|GND:|http://{re.escape(gnd_resolver_url)}|https://{re.escape(gnd_resolver_url)})?("
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can use ? instead of repeating http and https like

r"(pmid:|https?://pubmed.ncbi.nlm.nih.gov/)?(\d+)/?$", flags=re.I
.

I would also lean toward putting the url in the regex instead of having a variable

if val.startswith("http://" + gnd_resolver_url):
val = val[len("http://" + gnd_resolver_url) :]
elif val.startswith("https://" + gnd_resolver_url):
val = val[len("https://" + gnd_resolver_url) :]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you just use the regex in the normalize function like

def normalize_pmid(val):

@karkraeg
Copy link
Member Author

I updated the regex to match the one from https://www.wikidata.org/wiki/Property:P227 :
image

Copy link
Contributor

@tmorrell tmorrell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, Thanks!

Comment on lines 239 to 240
if val.startswith("d-nb.info/gnd/"):
val = val[len("d-nb.info/gnd/") :]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: is this actually needed now that the regex contains the URL? I feel this if ... clause would only match identifiers without the http(s) protocol in front, i.e. d-nb.info/gnd/12345. Maybe this logic can be captured in the regex?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, this can probably be removed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed that by chaning the regex accordingly, see https://regex101.com/r/C9VJpH/1

@tmorrell tmorrell merged commit d1bc7ef into inveniosoftware:master Jan 31, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants